---
title: Basic model workflow
description: Describes the basic workflow of the DataRobot model building process, with links to complete documentation for each step.

---

# Basic model workflow {: #basic-model-workflow }

Once the import has finished, DataRobot displays the **Data** page. From here you can set a target and change your project settings, then build your models. DataRobot initiates [EDA2](eda-explained) when you start the modeling process.

Generally speaking, once you select a target and click **Start**, DataRobot searches through millions of possible combinations of algorithms, preprocessing steps, features, transformations, and tuning parameters. It then uses supervised learning algorithms to analyze the data and identify (apparent) predictive relationships. These relationships represent the value of the target in unseen data, as determined by its relationship to the other dataset variables.

## Model building workflow {: #model-building-workflow }

DataRobot supports both [supervised](glossary/index#supervised-learning) and [unsupervised](glossary/index#unsupervised-learning) learning. The following outlines the steps for building models after [EDA1](eda-explained#eda1) completes, with links to more detailed discussions of each step:

1. (<em>Optional</em>) [Explore](#explore-your-data) your data.
2. (<em>Optional</em>) Investigate the [Data Quality Assessment](data-quality).
3. [Set the target](#set-the-target-feature) feature or set up an [unsupervised learning](unsupervised/index) run by clicking **No target** and selecting [Anomalies](anomaly-detection) or [Clusters](clustering).
4. Add secondary datasets for [Feature Discovery](fd-overview).
5. (<em>Optional</em>) Customize your model build, including:
	* Creating [multiclass models](multiclass).
	* Changing the [optimization metric](additional#change-the-optimization-metric).
	* Setting [advanced model building](adv-opt/index) options.
	* Creating [feature lists](feature-lists) to define feature subsets.
	* Creating [new (transformed) features](#create-new-features).
6. Set the [modeling mode](#set-the-modeling-mode).
7. (<em>Optional</em>) Set up [time-aware modeling](#set-up-time-aware-modeling), if applicable.
8. Start the model build process. (DataRobot provides [special handling](#build-failure) when the project fails after the build process starts.)
9. _Optional_. Investigate results of [automated target leakage](quality-check#target-leakage) detection.
10. _Optional_. [Rerun](#rerun-autopilot) modeling with newly configured settings.

    ![](images/data-screen.png)

!!! note
    DataRobot provides [special handling](fast-eda) of larger datasets to make viewing and model building work more efficiently. Specifically, [early target selection](fast-eda#fast-eda-and-early-target-selection) allows you to set build parameters and set the project to start automatically when ingestion completes. For more information, see the sections on [viewing the project summary](manage-projects#project-summaries) and [interpreting summary information](model-ref#data-summary-information).

See the [deep dive](model-ref) for more details on the model building process.

## Explore your data {: #explore-your-data }

Even before you begin the model building process, DataRobot can provide information about your data. After EDA1 completes, you can scroll down or click the **Explore** link to view DataRobot's first analysis of the data. EDA1 provides the following resources for exploring the data:

1. A [Data Quality Assessment](data-quality).

	![](images/dq-2.png)

2. For each feature, DataRobot detects the data (variable) type of each feature; supported data types are listed [here](model-ref#data-summary-information). Additional information on the data page includes unique and missing values, mean, median, standard deviation, and minimum and maximum values.

     ![](images/data-screen-1.png)

3. A histogram or table of Frequent Values for a selected feature as well as a dialog to modify the variable type (described in more detail [here](histogram)).

     ![](images/data-screen-2.png)


## Set the target feature {: #set-the-target-feature }

The model building phase of the project starts with selecting a target feature. The <em>target feature</em> is the name of the column in the dataset that you would like to predict. Until you select a target, the other **Start** screen configuration options aren't available.

Enter the name of the target feature you would like to predict. DataRobot lists matching features as you type:

![](images/data-uploaded-summary.png)

Alternatively, while exploring your data, notice that when you hover over a feature name a **Use as Target** link appears. Click the link to select that feature as the target.

![](images/use-as-target.png)

When you enter a target, DataRobot displays a histogram providing information about the target feature's distribution.

![](images/eda1-histo.png)

## Customize the model build {: #customize-the-model-build }

If you want to customize the build prior to building, you can modify a variety of advanced parameters (the optimization metric and many others), create [feature lists](feature-lists), and transform features. These options are described below.

### Optimization metric {: #optimization-metric }

The <em>optimization metric</em> defines how to score your models. Once you enter a target, DataRobot selects a default metric based on your data. The metric choice, which becomes visible after you select a target variable, is listed under the **Start** button. You can change the optimization metric through the [**Advanced options**](additional#change-the-optimization-metric) link.

Note that although you choose and build a project optimized for a specific metric, DataRobot computes many applicable metrics on each of the models. After the build completes, you can redisplay the Leaderboard listing based on a different metric. It will not change any values within the models, it will simply reorder the model listing based on their performance on this alternate metric.

### Improve accuracy {: #improve-accuracy }

If accuracy is a prime concern, consider selecting the "accuracy-optimized metablueprint" checkbox in [**Advanced options**](additional#time-limit-exceptions) prior to model building. Using this feature causes model building to run much more slowly, but potentially produces more accurate blueprints. (For example, with this option you may get XGBoost models with many more trees but a lower learning rate or with a deeper grid search.)

### Other advanced options {: #other-advanced-options }

The [**Show advanced options**](adv-opt/index) link allows you to set far more than the optimization metric. From there you can:

* Set [partitioning options](partitioning)
* Enable [Smart Downsampling](smart-ds)
* Set a variety of [additional parameters](additional), including weights, offset/exposure, running time limits, and more


### Create new features {: #create-new-features }

DataRobot supports two different types of transformations&mdash; automatic and manual. The software automatically creates derived features from any column that it identifies as var type `Date`. DataRobot also supports user-created transformations, which you can then include in your feature lists. See the more detailed description of [transformations](feature-transforms) for more information.

## Set up time-aware modeling {: #set-up-time-aware-modeling }

For projects where time is an important dimension, DataRobot provides an option to create [time-aware models](time/index)&mdash;models that use time for validation (OTV) or forecasting (time series). You can use out-of-time validation (OTV) and Automated Time Series modeling to predict individual events and to use time to validate performance for future data. Options for time-aware modeling become available after you select a target feature and <em>if</em> DataRobot detects a date/time feature in your dataset. If there are no time features, the option is grayed out and you can continue the modeling workflow.

## Set the modeling mode {: #set-the-modeling-mode }

!!! note
    See the [multistage Autopilot](multistep-ta) description for time-aware modeling.


By default, DataRobot runs Quick (Autopilot)&mdash;a shortened and optimized version of the full Autopilot mode. In Autopilot, DataRobot selects a predefined set of models to run based on the specified target feature and then trains the models on the training data set. Sample percentage sizes are based on the selected mode (see the table below) and time-aware setting.


For example, in full Autopilot, DataRobot first builds models using 16% of the total data on the selected models. When the models are scored, DataRobot selects the top 16 models and reruns them on 32% of the data. Taking the top 8 models from that run, DataRobot runs on 64% of the data (or 500MB of data, whichever is smaller). Results of all model runs, at all sample sizes, are displayed on the Leaderboard. This method supports running more models in the early stages and advancing only the top models to the next stage, allowing for greater model diversity and faster Autopilot runtimes. See the notes on [calculating Autopilot stages](repository#notes-on-sample-size) for more detail.

When running Autopilot, DataRobot initially caps the sample size at 500 MB. Once it selects a model for deployment, that model is rerun at 80% (exceeding the previous 500MB cap). Note the you can train any model to any sample size (exceeding 500 MB) from the **Repository** or [retrain models](creating-addl-models) to any size from the Leaderboard.

For more control over which models are run, use the additional options beneath the **Start** button. For large datasets, see the section on [early target selection](fast-eda#fast-eda-and-early-target-selection).

![](images/start-with-options.png)

!!! note
    See the table of [differences applied ](model-ref#small-datasets) when working with smaller datasets.

### Modeling modes explained {: #modeling-modes-explained }

The following table describes each of the modeling modes:

|  Modeling mode | Description |
|-------------|-------------|
|  Quick (default)  | Using a sample size of 64%, Quick Autopilot runs a [subset](model-ref#specifics-of-quick-autopilot) of models, based on the specified target feature and performance metric, to provide a base set of models that build and provide insights quickly.|
| Autopilot | In full automatic Autopilot mode, DataRobot selects the best predictive models for the specified feature. By default, Autopilot runs on the [Informative Features](feature-lists#automatically-created-feature-lists) feature list.                                                                                                            |
|  Manual | Manual mode gives you full control over which blueprints to execute. When you select Manual mode, DataRobot provides a message and link to the Repository after [EDA2](eda-explained) completes. |
|  Comprehensive    | [Comprehensive Autopilot mode](more-accuracy) runs all Repository blueprints on the maximum Autopilot sample size to ensure more accuracy for models. This mode results in extended build times. Note that you cannot use Comprehensive Autopilot mode for time series or [anomaly detection](anomaly-detection) projects.  |

## Start the build {: #start-the-build }

To start the build, select a [feature list](feature-lists):

![](images/feature-lists-active.png)

Then, select a [modeling mode](#set-the-modeling-mode) and click **Start** to initiate [EDA2](eda-explained). When the modeling process begins, DataRobot indicates the activity with a spinning icon by the **Models** tab. As models complete, a badge count also appears:

![](images/model-count.png)

The modeling process finds the best predictive models for the target feature. You can manage the build using the DataRobot [Worker Queue](worker-queue). If projects [fail to build](#build-failure), DataRobot provides information, including a traceback that can be sent to Support.

As models build, you can [explore the EDA2 data](model-ref#data-summary-information) DataRobot is using from the **Project Data** tab. Once complete, you can also [work with feature lists](feature-lists) or visualize [associations](feature-assoc) within your data from the **Data** page.

!!! note
        If you close your browser or log out, DataRobot continues building models in any projects that have started the model building phase.

## Build failure {: #build-failure }

After you load data, set a target, and select options, it is possible that your project fails to build (due to data format errors, for example). When this happens, DataRobot provides the information necessary to help troubleshoot the problem, whether on your own or with the help of Support. Errored projects, while not built, are saved to the [Manage Projects](manage-projects) inventory, with their traceback information. This helps to debug or repair issues without losing any feature engineering or other customization preprocessing you may have performed.

On first fail, DataRobot presents a dialog with:

* a brief error message
* the option to view traceback details by expanding the **Details** link
* the ability to dismiss the dialog

![](images/aim-fail-1.png)

Once dismissed, DataRobot provides a preliminary summary of project data with a message indicating that project creation failed. Click the **CONTACT SUPPORT** link to see the information available, then click **Submit** to send the information to the Support team. (For organizations that are not configured for direct contact to Support through the application, clicking the link opens your mail client.)

![](images/aim-fail-2.png)

At this point, you can continue working on other projects while Support investigates your issue. To revisit the failed project, open [**Manage Projects**](manage-projects). The failed project is marked with an icon indicating an issue:

![](images/aim-fail-3.png)

Select the project to return to the preliminary project data summary page. From here you can open the Support contact link or view your traceback.

##  Configure modeling settings {: #configure-modeling-settings }

When modeling completes, you can rerun the process&mdash;in either Autopilot, Quick, or Comprehensive mode&mdash;with new settings. Select **Configure modeling settings** in the right-side panel.

![](images/re-run-1.png)

* Select the [modeling mode](#set-the-modeling-mode): Autopilot, Quick, Manual, or Comprehensive.

* Choose the [feature list](feature-lists) used for modeling.

* Determine the [automation settings](additional): choose to only include blueprints with Scoring Code support, create blenders from top models, and recommend models for deployment.

![](images/rerun-2.png)

Once configured, click **Rerun** to restart the modeling process.
